Overview

Dataset statistics

Number of variables14
Number of observations197905
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory50.6 MiB
Average record size in memory268.0 B

Variable types

Numeric8
Categorical3
Boolean3

Alerts

tran_timestamp has a high cardinality: 720 distinct values High cardinality
orig_acct is highly correlated with is_sarHigh correlation
is_sar is highly correlated with orig_acctHigh correlation
initial_deposit_bene is highly correlated with age_beneHigh correlation
age_bene is highly correlated with initial_deposit_beneHigh correlation
tran_id is uniformly distributed Uniform
tran_id has unique values Unique

Reproduction

Analysis started2022-09-04 13:13:25.899519
Analysis finished2022-09-04 13:13:59.834044
Duration33.93 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

tran_id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct197905
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean98953
Minimum1
Maximum197905
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:00.101787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9896.2
Q149477
median98953
Q3148429
95-th percentile188009.8
Maximum197905
Range197904
Interquartile range (IQR)98952

Descriptive statistics

Standard deviation57130.39685
Coefficient of variation (CV)0.5773488105
Kurtosis-1.2
Mean98953
Median Absolute Deviation (MAD)49476
Skewness0
Sum1.958329346 × 1010
Variance3263882244
MonotonicityStrictly increasing
2022-09-04T15:14:00.468241image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20491
 
< 0.1%
1829241
 
< 0.1%
334771
 
< 0.1%
396221
 
< 0.1%
375751
 
< 0.1%
601041
 
< 0.1%
580571
 
< 0.1%
642021
 
< 0.1%
621551
 
< 0.1%
519161
 
< 0.1%
Other values (197895)197895
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
1979051
< 0.1%
1979041
< 0.1%
1979031
< 0.1%
1979021
< 0.1%
1979011
< 0.1%
1979001
< 0.1%
1978991
< 0.1%
1978981
< 0.1%
1978971
< 0.1%
1978961
< 0.1%

orig_acct
Real number (ℝ≥0)

HIGH CORRELATION

Distinct2090
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1909.848776
Minimum0
Maximum12007
Zeros103
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:00.741678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile95
Q1461
median2098
Q32738
95-th percentile4711
Maximum12007
Range12007
Interquartile range (IQR)2277

Descriptive statistics

Standard deviation1618.01312
Coefficient of variation (CV)0.8471943644
Kurtosis-0.5802615119
Mean1909.848776
Median Absolute Deviation (MAD)1545
Skewness0.5934536229
Sum377968622
Variance2617966.456
MonotonicityNot monotonic
2022-09-04T15:14:00.993937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2486310
 
0.2%
2584278
 
0.1%
2696207
 
0.1%
2671207
 
0.1%
533207
 
0.1%
2751207
 
0.1%
654206
 
0.1%
392206
 
0.1%
485206
 
0.1%
570206
 
0.1%
Other values (2080)195665
98.9%
ValueCountFrequency (%)
0103
0.1%
1104
0.1%
2103
0.1%
3103
0.1%
4103
0.1%
5103
0.1%
6103
0.1%
7103
0.1%
8111
0.1%
9102
0.1%
ValueCountFrequency (%)
120071
< 0.1%
119901
< 0.1%
119861
< 0.1%
119741
< 0.1%
119681
< 0.1%
118711
< 0.1%
118581
< 0.1%
118401
< 0.1%
118371
< 0.1%
118271
< 0.1%

bene_acct
Real number (ℝ≥0)

Distinct4077
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean569.6764811
Minimum0
Maximum11991
Zeros21
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:01.223674image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9
Q124
median53
Q3191
95-th percentile4149
Maximum11991
Range11991
Interquartile range (IQR)167

Descriptive statistics

Standard deviation1695.983714
Coefficient of variation (CV)2.977099757
Kurtosis19.95378574
Mean569.6764811
Median Absolute Deviation (MAD)38
Skewness4.364975752
Sum112741824
Variance2876360.758
MonotonicityNot monotonic
2022-09-04T15:14:01.490106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
253563
 
1.8%
203263
 
1.6%
133210
 
1.6%
143103
 
1.6%
273017
 
1.5%
172953
 
1.5%
242917
 
1.5%
232917
 
1.5%
182800
 
1.4%
122682
 
1.4%
Other values (4067)167480
84.6%
ValueCountFrequency (%)
021
 
< 0.1%
1293
 
0.1%
2420
 
0.2%
31342
0.7%
4859
0.4%
51295
0.7%
61378
0.7%
71746
0.9%
81659
0.8%
91563
0.8%
ValueCountFrequency (%)
119911
 
< 0.1%
119901
 
< 0.1%
119741
 
< 0.1%
118763
 
< 0.1%
118711
 
< 0.1%
118581
 
< 0.1%
118401
 
< 0.1%
1182270
< 0.1%
117461
 
< 0.1%
117021
 
< 0.1%

base_amt
Real number (ℝ≥0)

Distinct80654
Distinct (%)40.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean546.6319096
Minimum0.09
Maximum999.99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:01.766114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.09
5-th percentile140.992
Q1319.58
median546.78
Q3772.29
95-th percentile954.37
Maximum999.99
Range999.9
Interquartile range (IQR)452.71

Descriptive statistics

Standard deviation261.6711649
Coefficient of variation (CV)0.4786972006
Kurtosis-1.197072321
Mean546.6319096
Median Absolute Deviation (MAD)226.38
Skewness0.0007397461412
Sum108181188.1
Variance68471.79854
MonotonicityNot monotonic
2022-09-04T15:14:02.018386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
565.5111
 
< 0.1%
247.4710
 
< 0.1%
124.6910
 
< 0.1%
553.7110
 
< 0.1%
988.7710
 
< 0.1%
166.9610
 
< 0.1%
110.2610
 
< 0.1%
855.8310
 
< 0.1%
122.8410
 
< 0.1%
122.799
 
< 0.1%
Other values (80644)197805
99.9%
ValueCountFrequency (%)
0.091
< 0.1%
0.161
< 0.1%
0.252
< 0.1%
0.431
< 0.1%
0.521
< 0.1%
0.611
< 0.1%
0.911
< 0.1%
0.941
< 0.1%
1.371
< 0.1%
1.51
< 0.1%
ValueCountFrequency (%)
999.991
 
< 0.1%
999.983
< 0.1%
999.975
< 0.1%
999.961
 
< 0.1%
999.954
< 0.1%
999.941
 
< 0.1%
999.933
< 0.1%
999.911
 
< 0.1%
999.91
 
< 0.1%
999.894
< 0.1%

tran_timestamp
Categorical

HIGH CARDINALITY

Distinct720
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
2017-05-24T00:00:00Z
 
363
2017-04-26T00:00:00Z
 
363
2017-02-15T00:00:00Z
 
361
2017-06-14T00:00:00Z
 
359
2017-05-03T00:00:00Z
 
359
Other values (715)
196100 

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters3958100
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017-01-01T00:00:00Z
2nd row2017-01-01T00:00:00Z
3rd row2017-01-01T00:00:00Z
4th row2017-01-01T00:00:00Z
5th row2017-01-01T00:00:00Z

Common Values

ValueCountFrequency (%)
2017-05-24T00:00:00Z363
 
0.2%
2017-04-26T00:00:00Z363
 
0.2%
2017-02-15T00:00:00Z361
 
0.2%
2017-06-14T00:00:00Z359
 
0.2%
2017-05-03T00:00:00Z359
 
0.2%
2017-02-08T00:00:00Z359
 
0.2%
2017-06-21T00:00:00Z359
 
0.2%
2017-02-01T00:00:00Z359
 
0.2%
2017-04-05T00:00:00Z358
 
0.2%
2017-04-19T00:00:00Z358
 
0.2%
Other values (710)194307
98.2%

Length

2022-09-04T15:14:02.246579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-05-24t00:00:00z363
 
0.2%
2017-04-26t00:00:00z363
 
0.2%
2017-02-15t00:00:00z361
 
0.2%
2017-05-03t00:00:00z359
 
0.2%
2017-02-08t00:00:00z359
 
0.2%
2017-06-21t00:00:00z359
 
0.2%
2017-02-01t00:00:00z359
 
0.2%
2017-06-14t00:00:00z359
 
0.2%
2017-01-04t00:00:00z358
 
0.2%
2017-02-22t00:00:00z358
 
0.2%
Other values (710)194307
98.2%

Most occurring characters

ValueCountFrequency (%)
01633970
41.3%
-395810
 
10.0%
:395810
 
10.0%
1364403
 
9.2%
2310090
 
7.8%
T197905
 
5.0%
Z197905
 
5.0%
7155858
 
3.9%
8114761
 
2.9%
346883
 
1.2%
Other values (4)144705
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2770670
70.0%
Dash Punctuation395810
 
10.0%
Other Punctuation395810
 
10.0%
Uppercase Letter395810
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01633970
59.0%
1364403
 
13.2%
2310090
 
11.2%
7155858
 
5.6%
8114761
 
4.1%
346883
 
1.7%
537164
 
1.3%
436940
 
1.3%
636153
 
1.3%
934448
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
T197905
50.0%
Z197905
50.0%
Dash Punctuation
ValueCountFrequency (%)
-395810
100.0%
Other Punctuation
ValueCountFrequency (%)
:395810
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3562290
90.0%
Latin395810
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01633970
45.9%
-395810
 
11.1%
:395810
 
11.1%
1364403
 
10.2%
2310090
 
8.7%
7155858
 
4.4%
8114761
 
3.2%
346883
 
1.3%
537164
 
1.0%
436940
 
1.0%
Other values (2)70601
 
2.0%
Latin
ValueCountFrequency (%)
T197905
50.0%
Z197905
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3958100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01633970
41.3%
-395810
 
10.0%
:395810
 
10.0%
1364403
 
9.2%
2310090
 
7.8%
T197905
 
5.0%
Z197905
 
5.0%
7155858
 
3.9%
8114761
 
2.9%
346883
 
1.2%
Other values (4)144705
 
3.7%

is_sar
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size193.4 KiB
False
197234 
True
 
671
ValueCountFrequency (%)
False197234
99.7%
True671
 
0.3%
2022-09-04T15:14:02.463967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size193.4 KiB
False
174324 
True
23581 
ValueCountFrequency (%)
False174324
88.1%
True23581
 
11.9%
2022-09-04T15:14:02.638725image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

initial_deposit_orig
Real number (ℝ≥0)

Distinct2090
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75597.28015
Minimum50009.28
Maximum99999.31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:02.831748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum50009.28
5-th percentile53209.65
Q163514.54
median76142.4
Q387290.86
95-th percentile97289.61
Maximum99999.31
Range49990.03
Interquartile range (IQR)23776.32

Descriptive statistics

Standard deviation13970.57272
Coefficient of variation (CV)0.1848025841
Kurtosis-1.157314352
Mean75597.28015
Median Absolute Deviation (MAD)11870.37
Skewness-0.04921636702
Sum1.496107973 × 1010
Variance195176902.1
MonotonicityNot monotonic
2022-09-04T15:14:03.103602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
87199.11310
 
0.2%
62288.74278
 
0.1%
90983.71207
 
0.1%
92422.42207
 
0.1%
83668.85207
 
0.1%
77381.5207
 
0.1%
68754.28206
 
0.1%
92961.44206
 
0.1%
97514.84206
 
0.1%
75858.51206
 
0.1%
Other values (2080)195665
98.9%
ValueCountFrequency (%)
50009.281
 
< 0.1%
500501
 
< 0.1%
50058.695
< 0.1%
50060.391
 
< 0.1%
50110.37103
0.1%
50255.4994
< 0.1%
50261.31108
0.1%
50291.361
 
< 0.1%
50295.9192
< 0.1%
50316.89130
0.1%
ValueCountFrequency (%)
99999.31103
0.1%
99951.81103
0.1%
99942.06103
0.1%
99932.76198
0.1%
99928.3197
0.1%
99915.41
 
< 0.1%
99870.09103
0.1%
99857.44103
0.1%
99703.4103
0.1%
99681.71
 
< 0.1%

gender_orig
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.7 MiB
Female
100200 
Male
97705 

Length

Max length6
Median length6
Mean length5.012607059
Min length4

Characters and Unicode

Total characters992020
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female100200
50.6%
Male97705
49.4%

Length

2022-09-04T15:14:03.338383image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-04T15:14:03.537767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
female100200
50.6%
male97705
49.4%

Most occurring characters

ValueCountFrequency (%)
e298105
30.1%
a197905
19.9%
l197905
19.9%
F100200
 
10.1%
m100200
 
10.1%
M97705
 
9.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter794115
80.1%
Uppercase Letter197905
 
19.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e298105
37.5%
a197905
24.9%
l197905
24.9%
m100200
 
12.6%
Uppercase Letter
ValueCountFrequency (%)
F100200
50.6%
M97705
49.4%

Most occurring scripts

ValueCountFrequency (%)
Latin992020
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e298105
30.1%
a197905
19.9%
l197905
19.9%
F100200
 
10.1%
m100200
 
10.1%
M97705
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII992020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e298105
30.1%
a197905
19.9%
l197905
19.9%
F100200
 
10.1%
m100200
 
10.1%
M97705
 
9.8%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size193.4 KiB
False
162053 
True
35852 
ValueCountFrequency (%)
False162053
81.9%
True35852
 
18.1%
2022-09-04T15:14:03.702999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

initial_deposit_bene
Real number (ℝ≥0)

HIGH CORRELATION

Distinct4077
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74310.10622
Minimum50001.55
Maximum99999.31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:03.881930image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum50001.55
5-th percentile52405.77
Q162535.57
median71210.53
Q388048.5
95-th percentile97896.53
Maximum99999.31
Range49997.76
Interquartile range (IQR)25512.93

Descriptive statistics

Standard deviation14760.15747
Coefficient of variation (CV)0.198629207
Kurtosis-1.211244726
Mean74310.10622
Median Absolute Deviation (MAD)13116.03
Skewness0.1628954081
Sum1.470634157 × 1010
Variance217862248.5
MonotonicityNot monotonic
2022-09-04T15:14:04.120446image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
67858.543563
 
1.8%
51562.383263
 
1.6%
64631.33210
 
1.6%
97896.533103
 
1.6%
56983.453017
 
1.5%
62535.572953
 
1.5%
89199.142917
 
1.5%
91434.672917
 
1.5%
62618.62800
 
1.4%
52909.532682
 
1.4%
Other values (4067)167480
84.6%
ValueCountFrequency (%)
50001.552
 
< 0.1%
50003.895
 
< 0.1%
50020.255
 
< 0.1%
50028.145
 
< 0.1%
50034.041
 
< 0.1%
500502
 
< 0.1%
50057.78
< 0.1%
50058.61
 
< 0.1%
50066.612
 
< 0.1%
50070.5417
< 0.1%
ValueCountFrequency (%)
99999.317
 
< 0.1%
99994.472
 
< 0.1%
99984.6810
 
< 0.1%
99961.382
 
< 0.1%
99951.814
 
< 0.1%
99932.7614
 
< 0.1%
99928.311
 
< 0.1%
99918.74107
0.1%
99915.41
 
< 0.1%
99868.47168
0.1%

gender_bene
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.7 MiB
Female
100345 
Male
97560 

Length

Max length6
Median length6
Mean length5.014072408
Min length4

Characters and Unicode

Total characters992310
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female100345
50.7%
Male97560
49.3%

Length

2022-09-04T15:14:04.351676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-04T15:14:04.557009image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
female100345
50.7%
male97560
49.3%

Most occurring characters

ValueCountFrequency (%)
e298250
30.1%
a197905
19.9%
l197905
19.9%
F100345
 
10.1%
m100345
 
10.1%
M97560
 
9.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter794405
80.1%
Uppercase Letter197905
 
19.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e298250
37.5%
a197905
24.9%
l197905
24.9%
m100345
 
12.6%
Uppercase Letter
ValueCountFrequency (%)
F100345
50.7%
M97560
49.3%

Most occurring scripts

ValueCountFrequency (%)
Latin992310
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e298250
30.1%
a197905
19.9%
l197905
19.9%
F100345
 
10.1%
m100345
 
10.1%
M97560
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII992310
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e298250
30.1%
a197905
19.9%
l197905
19.9%
F100345
 
10.1%
m100345
 
10.1%
M97560
 
9.8%

age_orig
Real number (ℝ≥0)

Distinct117
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.89496476
Minimum0
Maximum116
Zeros801
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:04.739106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7
Q131
median62
Q390
95-th percentile110
Maximum116
Range116
Interquartile range (IQR)59

Descriptive statistics

Standard deviation33.73379436
Coefficient of variation (CV)0.5632158646
Kurtosis-1.221810746
Mean59.89496476
Median Absolute Deviation (MAD)30
Skewness-0.07009563801
Sum11853513
Variance1137.968882
MonotonicityNot monotonic
2022-09-04T15:14:05.003315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1142743
 
1.4%
1092591
 
1.3%
522575
 
1.3%
822539
 
1.3%
732529
 
1.3%
862495
 
1.3%
122493
 
1.3%
102478
 
1.3%
642420
 
1.2%
692359
 
1.2%
Other values (107)172683
87.3%
ValueCountFrequency (%)
0801
 
0.4%
12232
1.1%
21474
0.7%
3961
0.5%
4952
0.5%
51584
0.8%
61384
0.7%
71885
1.0%
81141
0.6%
91729
0.9%
ValueCountFrequency (%)
1161018
 
0.5%
1151096
 
0.6%
1142743
1.4%
1131248
0.6%
1121740
0.9%
1112016
1.0%
1101944
1.0%
1092591
1.3%
1081653
0.8%
1072226
1.1%

age_bene
Real number (ℝ≥0)

HIGH CORRELATION

Distinct117
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.44603724
Minimum0
Maximum116
Zeros632
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2022-09-04T15:14:05.401777image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7
Q131
median61
Q383
95-th percentile108
Maximum116
Range116
Interquartile range (IQR)52

Descriptive statistics

Standard deviation31.93012218
Coefficient of variation (CV)0.5463180001
Kurtosis-1.125626711
Mean58.44603724
Median Absolute Deviation (MAD)25
Skewness-0.0850688605
Sum11566763
Variance1019.532702
MonotonicityNot monotonic
2022-09-04T15:14:05.663344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236248
 
3.2%
785746
 
2.9%
545559
 
2.8%
515535
 
2.8%
744400
 
2.2%
494396
 
2.2%
804395
 
2.2%
734285
 
2.2%
644132
 
2.1%
753964
 
2.0%
Other values (107)149245
75.4%
ValueCountFrequency (%)
0632
 
0.3%
1750
 
0.4%
22786
1.4%
3553
 
0.3%
41429
0.7%
52441
1.2%
6836
 
0.4%
71415
0.7%
82493
1.3%
9347
 
0.2%
ValueCountFrequency (%)
116430
 
0.2%
115788
 
0.4%
114497
 
0.3%
113415
 
0.2%
112410
 
0.2%
1112302
1.2%
110782
 
0.4%
1092445
1.2%
1083026
1.5%
107679
 
0.3%

Interactions

2022-09-04T15:13:55.906679image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:37.438741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:39.756007image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:41.907742image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:46.995286image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:49.297034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:51.473336image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:53.754138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:56.190862image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:37.794647image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:40.036973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:42.186491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:47.343109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:49.577713image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:51.761294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:54.035817image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:56.463108image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:38.091182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:40.305985image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:42.454918image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:47.666220image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:49.848977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:52.030241image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:54.304563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:56.733616image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:38.367750image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:40.576088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:42.748860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:47.968464image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:50.120941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:52.328786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:54.574882image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:56.992306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:38.634517image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:40.837391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:43.052818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:48.232177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:50.379977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:52.622172image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:54.837415image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:57.257031image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:38.917000image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:41.104087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:45.432352image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:48.497903image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:50.644417image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:52.925160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:55.106535image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:57.522879image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:39.194295image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:41.377757image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:45.835407image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:48.764082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:50.932975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:53.219118image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:55.367522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:57.786733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:39.477437image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:41.645343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:46.577437image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:49.033155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:51.203076image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:53.485460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-09-04T15:13:55.638815image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-09-04T15:14:05.883330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-04T15:14:06.207372image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-04T15:14:06.530292image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-04T15:14:06.829392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-04T15:14:07.070075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-04T15:13:58.304710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-04T15:13:59.101333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

tran_idorig_acctbene_acctbase_amttran_timestampis_sarprior_sar_count_originitial_deposit_origgender_origprior_sar_count_beneinitial_deposit_benegender_beneage_origage_bene
014376170885.302017-01-01T00:00:00ZFalseFalse63446.28FemaleFalse84168.61Female10643
12430023630.412017-01-01T00:00:00ZFalseFalse79684.15MaleFalse89199.14Female5841
23443312393.142017-01-01T00:00:00ZFalseFalse64630.28FemaleFalse52909.53Male11451
3425526503659.742017-01-01T00:00:00ZFalseFalse79188.34FemaleFalse57537.45Female3527
4525526503442.442017-01-01T00:00:00ZFalseFalse79188.34FemaleFalse57537.45Female3527
5628175140.062017-01-01T00:00:00ZFalseTrue87074.80FemaleFalse50335.69Male7853
67553400612.572017-01-01T00:00:00ZFalseFalse76295.34FemaleFalse85853.72Female10824
78240990665.622017-01-01T00:00:00ZFalseFalse76874.34MaleFalse70044.46Male8860
89668132970.502017-01-01T00:00:00ZFalseFalse60794.76FemaleFalse60784.17Male1814
9102085104945.462017-01-01T00:00:00ZFalseFalse88190.98MaleTrue67881.53Male6278

Last rows

tran_idorig_acctbene_acctbase_amttran_timestampis_sarprior_sar_count_originitial_deposit_origgender_origprior_sar_count_beneinitial_deposit_benegender_beneage_origage_bene
19789519789644364475.662018-12-21T00:00:00ZFalseFalse53563.64FemaleFalse91528.28Male1560
19789619789746516029632.002018-12-21T00:00:00ZFalseFalse74115.62MaleFalse95516.71Male137
19789719789826514188.372018-12-21T00:00:00ZFalseFalse50781.53MaleFalse97896.53Female11554
19789819789910513116.022018-12-21T00:00:00ZFalseTrue82521.54FemaleTrue64631.30Male7849
197899197900675270502.132018-12-21T00:00:00ZFalseFalse70050.90FemaleTrue64211.03Female11557
197900197901294851675.302018-12-21T00:00:00ZFalseFalse73456.24MaleFalse71122.41Male4911
1979011979022767688864.552018-12-21T00:00:00ZFalseFalse58993.69FemaleFalse64648.90Male11554
197902197903885198682.872018-12-21T00:00:00ZFalseFalse51234.09FemaleTrue85152.71Female78114
1979031979044278639780.682018-12-21T00:00:00ZFalseFalse66156.31MaleFalse96278.77Male2510
19790419790550142459.012018-12-21T00:00:00ZFalseFalse88951.73FemaleFalse70573.82Male103111